Search CORE

50 research outputs found

Large-scale Hierarchical Alignment for Data-driven Text Rewriting

Author: Hahnloser Richard H. R.
Nikolov Nikola I.
Publication venue
Publication date: 01/01/2019
Field of study

We propose a simple unsupervised method for extracting pseudo-parallel monolingual sentence pairs from comparable corpora representative of two different text styles, such as news articles and scientific papers. Our approach does not require a seed parallel corpus, but instead relies solely on hierarchical search over pre-trained embeddings of documents and sentences. We demonstrate the effectiveness of our method through automatic and extrinsic evaluation on text simplification from the normal to the Simple Wikipedia. We show that pseudo-parallel sentences extracted with our method not only supplement existing parallel data, but can even lead to competitive performance on their own.Comment: RANLP 201

arXiv.org e-Print Archive

Repository for Publications and Research Data

Crossref

ZORA

Character-level Chinese-English Translation through ASCII Encoding

Author: Hahnloser Richard H. R.
Hu Yuhuang
Nikolov Nikola I.
Tan Mi Xue
Publication venue
Publication date: 01/01/2018
Field of study

Character-level Neural Machine Translation (NMT) models have recently achieved impressive results on many language pairs. They mainly do well for Indo-European language pairs, where the languages share the same writing system. However, for translating between Chinese and English, the gap between the two different writing systems poses a major challenge because of a lack of systematic correspondence between the individual linguistic units. In this paper, we enable character-level NMT for Chinese, by breaking down Chinese characters into linguistic units similar to that of Indo-European languages. We use the Wubi encoding scheme, which preserves the original shape and semantic information of the characters, while also being reversible. We show promising results from training Wubi-based models on the character- and subword-level with recurrent as well as convolutional models.Comment: 7 pages, 3 figures, 3rd Conference on Machine Translation (WMT18), 201

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Repository for Publications and Research Data

Crossref

ZORA

Embedding-based Scientific Literature Discovery in a Text Editor Application

Author: Gu Nianlong
Gökçe Onur
Hahnloser
Hahnloser Richard H.R.
Nikolov Nikola I.
Prada Jonathan
Publication venue
Publication date: 11/05/2020
Field of study

Each claim in a research paper requires all relevant prior knowledge to be discovered, assimilated, and appropriately cited. However, despite the availability of powerful search engines and sophisticated text editing software, discovering relevant papers and integrating the knowledge into a manuscript remain complex tasks associated with high cognitive load. To define comprehensive search queries requires strong motivation from authors, irrespective of their familiarity with the research field. Moreover, switching between independent applications for literature discovery, bibliography management, reading papers, and writing text burdens authors further and interrupts their creative process. Here, we present a web application that combines text editing and literature discovery in an interactive user interface. The application is equipped with a search engine that couples Boolean keyword filtering with nearest neighbor search over text embeddings, providing a discovery experience tuned to an author's manuscript and his interests. Our application aims to take a step towards more enjoyable and effortless academic writing. The demo of the application (https://SciEditorDemo2020.herokuapp.com/) and a short video tutorial (https://youtu.be/pkdVU60IcRc) are available online

arXiv.org e-Print Archive

Repository for Publications and Research Data

ZORA

Character-Level Translation with Self-attention

Author: Gao Yingqiang
Hahnloser Richard H. R.
Hu Yuhuang
Nikolov Nikola I.
Publication venue
Publication date: 01/01/2020
Field of study

We explore the suitability of self-attention models for character-level neural machine translation. We test the standard transformer model, as well as a novel variant in which the encoder block combines information from nearby characters using convolutions. We perform extensive experiments on WMT and UN datasets, testing both bilingual and multilingual translation to English using up to three input languages (French, Spanish, and Chinese). Our transformer variant consistently outperforms the standard transformer at the character-level and converges faster while learning more robust character-level alignments.Comment: ACL 202

arXiv.org e-Print Archive

Repository for Publications and Research Data

Crossref

ZORA

Improving efficiency of supercontinuum generation in photonic crystal fibers by direct degenerate four-wave-mixing

Author: Alfano
Alfano
Anders Bjarklev
Baldeck
Birks
Blow
Coen
Coen
Dinda
Dudley
Ferrando
Ferrando
Garnier
Hansen
Hedekvist
Herrmann
Husakou
Johnson
Karlsson
Knapp
Knight
Kuwaki
Lin
Mori
Mori
Nikola I. Nikolov
Ole Bang
Pole
Ranka
Reeves
Tamura
Thorkild Sørensen
Wadsworth
Wai
Publication venue: 'The Optical Society'
Publication date: 01/01/2002
Field of study

We numerically study supercontinuum (SC) generation in photonic crystal fibers pumped with low-power 30-ps pulses close to the zero dispersion wavelength 647nm. We show how the efficiency is significantly improved by designing the dispersion to allow widely separated spectral lines generated by degenerate four-wave-mixing (FWM) directly from the pump to broaden and merge. By proper modification of the dispersion profile the generation of additional FWM Stokes and anti-Stokes lines results in efficient generation of an 800nm wide SC. Simulations show that the predicted efficient SC generation is more robust and can survive fiber imperfections modelled as random fluctuations of the dispersion coefficients along the fiber length.Comment: Submited to Journal of the Optical Society of America B on 16 September 200

arXiv.org e-Print Archive

Crossref

CERN Document Server

Online Research Database In Technology

Quadratic solitons as nonlocal solitons

Author: A. Mamaev
A.G. Litvak
A.V. Buryak
A.V. Buryak
C. Conti
C.R. Menyuk
D. Suter
D.W. McLaughlin
Dragomir Neshev
E.M. Conwell
F. Wise
G. Assanto
G.F. Calvo
I.V. Shadrivov
J. Wyller
L. Bergé
L. Bergé
M. Peccianti
Nikola I. Nikolov
O. Bang
O. Bang
Ole Bang
R. Schiek
W. Krolikowski
W. Krolikowski
W.E. Torruellas
Wiesław Z. Królikowski
Yu.S. Kivshar
Publication venue: 'American Physical Society (APS)'
Publication date: 01/01/2003
Field of study

We show that quadratic solitons are equivalent to solitons of a nonlocal Kerr medium. This provides new physical insight into the properties of quadratic solitons, often believed to be equivalent to solitons of an effective saturable Kerr medium. The nonlocal analogy also allows for novel analytical solutions and the prediction of novel bound states of quadratic solitons.Comment: 4 pages, 3 figure

arXiv.org e-Print Archive

Crossref

Online Research Database In Technology

The genetic history of the Southern Arc: a bridge between West Asia and Europe

Author: Acar Ayşe
Agelarakis Anagnostis
Aghikyan Levon
Akyüz Uğur
Alpaslan-Roodenberg Songül
Andreeva Desislava
Andrijašević Gojko
Antonović Dragana
Armit Ian
Atmaca Alper
Avetisyan Pavel
Aytek Ahmet İhsan
Açıkkol Ayşen
Bacvarov Krum
Badalyan Ruben
Bakardzhiev Stefan
Balen Jacqueline
Bejko Lorenc
Bernardos Rebecca
Bertsatos Andreas
Biber Hanifi
Bilir Ahmet
Bodružić Mario
Bonogofsky Michelle
Bonsall Clive
Borić Dušan
Borovinić Nikola
Bravo Morante Guillermo
Buttinger Katharina
Callan Kim
Candilio Francesca
Carić Mario
Cheronet Olivia
Chohadzhiev Stefan
Chovalopoulou Maria-Eleni
Chryssoulaki Stella
Ciobanu Ion
Constantinescu Mihai
Cristiani Emanuela
Culleton Brendan J.
Curtis Elizabeth
Davis Jack
Demcenco Tatiana I.
Deniz Kesici Seda
Dergachev Valentin
Derin Zafer
Deskaj Sylvia
Devejyan Seda
Djordjević Vojislav
Duffett Carlson Kellie Sara
Eccles Laurie R.
Elenski Nedko
Engin Atilla
Erdoğan Nihat
Erir-Pazarcı Sabiha
Fernandes Daniel M.
Ferry Matthew
Freilich Suzanne
Frînculeasa Alin
Galaty Michael L.
Gamarra Beatriz
Gasparyan Boris
Gaydarska Bisserka
Genç Elif
Gültekin Timur
Gündüz Serkan
Hajdu Tamás
Heyd Volker
Hobosyan Suren
Hovhannisyan Nelli
Iliev Iliya
Iliev Lora
Iliev Stanislav
İvgin İlkay
Janković Ivor
Jovanova Lence
Karkanas Panagiotis
Kavaz-Kındığılı Berna
Kaya Esra Hilal
Keating Denise
Kennett Douglas J.
Khudaverdyan Anahit
Kılıç Sinan
Kiss Krisztián
Klostermann Paul
Kostak Boca Negra Valdes Sinem
Kovačević Saša
Krenz-Niedbała Marta
Krznarić Škrivanko Maja
Kurti Rovena
Kuzman Pasko
Lawson Ann Marie
Lazar Catalin
Lazaridis Iosif
Leshtakov Krassimir
Levy Thomas E.
Liritzis Ioannis
Lorentz Kirsi O.
Mah Matthew
Mallick Swapan
Mandl Kirsten
Martirosyan-Olshansky Kristine
Matthews Roger
Matthews Wendy
McSweeney Kathleen
Melikyan Varduhi
Micco Adam
Michel Megan
Milašinović Lidija
Mittnik Alissa
Monge Janet M.
Nekhrizov Georgi
Nicholls Rebecca
Nikitin Alexey G.
Nikolov Vassil
Novak Mario
Olalde Iñigo
Oppenheimer Jonas
Osterholtz Anna
Papadimitriou Nikos
Papakonstantinou Niki
Papathanasiou Anastasia
Paraman Lujana
Paskary Evgeny G.
Patterson Nick
Petrakiev Ilian
Petrosyan Levon
Petrova Vanya
Philippa-Touchais Anna
Piliposyan Ashot
Pinhasi Ron
Pocuca Kuzman Nada
Potrebica Hrvoje
Preda-Bălănică Bianca
Premužić Zrinka
Price T. Douglas
Qiu Lijun
Radović Siniša
Raeuf Aziz Kamal
Rajić Šikanjić Petra
Rasheed Raheem Kamal
Razumov Sergei
Reich David
Richardson Amy
Rohland Nadin
Roodenberg Jacob
Ruka Rudenc
Russeva Victoria
Savaş Emre
Schattke Constanze
Schepartz Lynne
Selçuk Tayfun
Sevim-Erol Ayla
Shamoon-Pour Michel
Shephard Henry M.
Sideris Athanasios
Simalcsik Angela
Simonyan Hakob
Sinika Vitalij
Sirak Kendra
Sirbu Ghenadie
Soficaru Andrei
Sołtysiak Arkadiusz
Stathi Maria
Steskal Martin
Stewardson Kristin
Stocker Sharon
Suata-Alpaslan Fadime
Suvorov Alexander
Szeniczey Tamás
Szécsényi-Nagy Anna
Sönmez-Sözer Çilem
Söğüt Bilal
Telnov Nikolai
Temov Strahil
Todorova Nadezhda
Tota Ulsi
Touchais Gilles
Triantaphyllou Sevi
Türker Atila
Ugarković Marina
Valchev Todor
Veljanovska Fanica
Videvski Zlatko
Virag Cristian
Wagner Anna
Walsh Sam
Workman J. Noah
Włodarczak Piotr
Yardumian Aram
Yarovoy Evgenii
Yavuz Alper Yener
Yılmaz Hakan
Zalzala Fatma
Zettl Anna
Zhang Zhao
Çavuşoğlu Rafet
Özdemir Celal
Özdoğan Kadir Toykan
Öztürk Nurettin
Čondić Natalija
Łukasik Sylwia
Şahin Mustafa
Şarbak Ayşegül
Šlaus Mario
Publication venue: 'American Association for the Advancement of Science (AAAS)'
Publication date: 01/01/2022
Field of study

By sequencing 727 ancient individuals from the Southern Arc (Anatolia and its neighbors in Southeastern Europe and West Asia) over 10,000 years, we contextualize its Chalcolithic period and Bronze Age (about 5000 to 1000 BCE), when extensive gene flow entangled it with the Eurasian steppe. Two streams of migration transmitted Caucasus and Anatolian/Levantine ancestry northward, and the Yamnaya pastoralists, formed on the steppe, then spread southward into the Balkans and across the Caucasus into Armenia, where they left numerous patrilineal descendants. Anatolia was transformed by intra–West Asian gene flow, with negligible impact of the later Yamnaya migrations. This contrasts with all other regions where Indo-European languages were spoken, suggesting that the homeland of the Indo-Anatolian language family was in West Asia, with only secondary dispersals of non-Anatolian Indo-Europeans from the steppe

Central Archive at the University of Reading

İstanbul Üniversitesi Açık Erişim Sistemi

Abstractive Document Summarization in High and Low Resource Settings

Author: Nikolov Nikola I.
Publication venue: ETH Zurich
Publication date: 01/05/2020
Field of study

Automatic summarization aims to reduce an input document to a compressed version that captures only its salient parts. It is a topic with growing importance in today's age of information overflow. There are two main types of automatic summarization. Extractive summarization only selects salient sentences from the input, while abstractive summarization generates a summary without explicitly re-using whole sentences, resulting in summaries are often more fluent. State-of-the-art approaches to abstractive summarization are data-driven, relying on the availability of large collections of paired articles with summaries. The pairs are typically manually constructed, a task which is costly and time-consuming. Furthermore, when targeting a slightly different domain or summary format, a new parallel dataset is often required. This large reliance on parallel resources limits the potential impact of abstractive summarization systems in society. In this thesis, we consider the problem of abstractive summarization from two different perspectives: high-resource and low-resource summarization. In the first part, we compare different methods for data-driven summarization, focusing specifically on the problem of generating long, abstractive summaries, such as an abstract for a scientific journal article. We discuss the difficulties that come with abstractive generation of long summaries and propose methods for alleviating them. In the second part of this thesis, we develop low-resource methods for abstractive text rewriting, first focusing on individual sentences and then on whole summaries. Our methods do not rely on parallel data, but instead utilize raw non-parallel text collections. In overall, this work makes a step towards data-driven abstractive summarization for the generation of long summaries, without having to rely on vast amounts of parallel, manually curated data

Repository for Publications and Research Data

Abstractive Document Summarization without Parallel Data

Author: Hahnloser Richard H R
Nikolov Nikola I
Publication venue: European Language Resources Association
Publication date: 01/01/2020
Field of study

Abstractive summarization typically relies on large collections of paired articles and summaries. However, in many cases, parallel data is scarce and costly to obtain. We develop an abstractive summarization system that relies only on large collections of example summaries and non-matching articles. Our approach consists of an unsupervised sentence extractor that selects salient sentences to include in the final summary, as well as a sentence abstractor that is trained on pseudo-parallel and synthetic data, that paraphrases each of the extracted sentences. We perform an extensive evaluation of our method: on the CNN/DailyMail benchmark, on which we compare our approach to fully supervised baselines, as well as on the novel task of automatically generating a press release from a scientific journal article, which is well suited for our system. We show promising performance on both tasks, without relying on any article-summary pairs.Comment: LREC 202

arXiv.org e-Print Archive

Repository for Publications and Research Data

ZORA